Predicting overall performance in Ironman 70.3 age group triathletes through split disciplines

Knowing which discipline contributes most to a triathlon performance is important to plan race pacing properly. To date, we know that the running split is the most decisive discipline in the Olympic distance triathlon, and the cycling split is the most important discipline in the full-distance Ironman® triathlon. However, we have no knowledge of the Ironman® 70.3. This study intended to determine the most crucial discipline in age group athletes competing from 2004 to 2020 in a total of 787 Ironman® 70.3 races. A total of 823,459 athletes (198,066 women and 625,393 men) from 240 different countries were analyzed and recorded in 5-year age groups, from 18 to 75 + years. Correlation analysis, multiple linear regression, and two-way ANOVA were applied, considering p < 0.05. No differences in the regression analysis between the contributions of the swimming, cycling, and running splits could be found for all age groups. However, the correlation analysis showed stronger associations of the cycling and running split times than the swimming split times with overall race times and a smaller difference in swimming performance between males and females in age groups 50 years and older. For age group triathletes competing in Ironman® 70.3, running and cycling were more predictive than swimming for overall race performance. There was a progressive reduction in the performance gap between men and women aged 50 years and older. This information may aid triathletes and coaches in planning their race tactics in an Ironman® 70.3 race.

www.nature.com/scientificreports/ might influence performances in the other splits as it has been suggested by research on Sprint-and Olympicdistance formats. For instance, Peeling and Landers 33 reported that a relatively low-intensity (without being slow) swimming split would result in faster cycling and running splits. Also, an analysis of the Olympic triathlon showed that competitive athletes should aim to complete each split as fast as possible and already from the swimming split to be in the "first pack" 9 . The aim of this study was, therefore, to investigate which split discipline is the best predictor for overall race time in female and male Ironman ® 70.3 age group triathletes. There are differences in the age-related performance decline in Olympic distance triathletes compared to full-distance Ironman ® triathletes. The magnitude of the age-related performance declines in cycling and running performance is greater in the full-distance Ironman ® triathlon than in the Olympic distance triathlon 17 . Because the running split is the most influential split in the Olympic distance triathlon and the cycling split the most influential in the full-distance Ironman ® triathlon, we hypothesized that the running split would be the most influential in Ironman ® 70.3. It should be highlighted that the running split in the Ironman ® 70.3 (21 km) is closer to the 10 km of the Olympic distance triathlon compared to the 42.195 km in the full-distance Ironman ® triathlon.

Materials and methods
Ethical approval. This study was approved by the Institutional Review Board of Kanton St. Gallen, Switzerland, with a waiver of the requirement for informed consent of the participants as the study involved the analysis of publicly available data (EKSG 01-06-2010). All methods were performed in accordance with the relevant guidelines and regulations. The research has been performed in accordance with the Declaration of Helsinki.
Data set and data preparation. The athletes' data were downloaded from the official Ironman ® website (https:// ironm an. com) using a Python script. The athletes' sex, age, country of origin, times for swimming, running, cycling, transition times 1 and 2, and overall race times were obtained. We analyzed successful age group finishers of all Ironman ® 70.3 races recorded on the Ironman ® website between 2004 and 2020 18 . Data considered for statistical analysis were the time of each split discipline of the Ironman ® 70.3 distance for swimming, cycling, running, and transition times (represented by transition 1-swimming to cycling and transition 2-cycling to running) and overall race times. Exclusion criteria were (i) athletes who did not start (DNS) or did not finish (DNF); (ii) disqualified athletes (DSQ); (iii) athletes with at least one missing split time; and (iv) inconsistent times (i.e., impossible split times, or final times smaller than split times, etc.). Statistical analysis. The data distribution normality was validated by plotting the histograms of each split discipline and overall race times. Descriptive characteristics were presented as means and standard deviations, along with maximum and minimum values.

Groups stratification.
Correlation analysis (Pearson and Spearman) was performed to verify the association level between the overall race time and the split times (swimming, running, cycling and the first and second transitions), separately for males and females. Correlation coefficient classification criteria with 0.0 to 0.3 as a negligible correlation, 0.3 to 0.5 as a low correlation, 0.5 to 0.7 as a moderate correlation, 0.7 to 0.9 as a high correlation and 0.9 to 1.0 as a very high correlation were used 19 . Correlation matrixes were graphically presented for both sexes.
The overall race time is the sum of the partial times. This is deterministic, nonetheless, regressions can be built around the dataset. A multiple linear regression (MLR) was built to model the association between splits, transition times, and overall race times. Scatter plots were used to illustrate these associations.
The MLR model can be represented as follows: where: ST = Swim Time, BT = Bike Time, RT = Run Time, T1T and T2T are transition times and a, b, c, d, e are the regression coefficients and K the intercept constant. Also, three univariate linear regression (ULR) models were built to try and find the "most predictive" split discipline.
where: S, B and R are the intercept constants of each model, s, b and r are the regression coefficients of each model, ST, BT and RT are the Swim Time, Bike Time and Run Time, respectively.
A two-way ANOVA was applied to compare the time performance for each age group for both female and male finishers. Results were considered significant at p < 0.05. Scatter plots and box plots were also used to illustrate the associations between split and overall race times and between sex and age groups. All statistical analyses were performed using the language Python and Google Colab notebooks. www.nature.com/scientificreports/ Institutional review board statement. This study was approved by the Institutional Review Board of Kanton St. Gallen, Switzerland, with a waiver of the requirement for informed consent of the participants as the study involved the analysis of publicly available data.

Results
A total of 823,459 race records (198,066 from women and 625,393 from men) were finalists in Ironman ® 70.3 races from 2004 to 2020. Over the years, the number of recreational (age group) athletes increased where the male-tofemale ratio remained at approximately 3.2 between 2012 and 2019, indicating a predominance of males (Fig. 1). The average and best finish times of all finishers by gender and calendar year are presented in Fig. 2. Women's best times decreased while men's best times remained flat. Average times have remained relatively flat for all whiles the number of participants soared. The distribution of finish and split times by gender is summarized in Fig. 3.
Boxplots of split times and overall race times by gender and age group are presented in Fig. 4. The results of the two-way ANOVA tests applied to overall race times and split times showed that, for each main effect (i.e., age group and gender) and their interaction the calculated p-values were zero or nearly zero. Therefore we rejected the null hypothesis and concluded that the differences between age groups and genders were statistically significant. For overall race times and split times, men were always faster than women in each age group (p < 0.05).  www.nature.com/scientificreports/ It is worth noticing how close male and female swim time boxes are, with significant overlap in the low and middle-aged groups compared to the bike or run splits (or to the overall race times). This suggests that the performance gap between men and women is smaller in the water (swimming) than on land (running and cycling). Bike times remained flatter than run and swim times through the low and middle-aged groups, suggesting the cycling split exhibits a more consistent performance across the different age groups. Figure 5 presents the correlation matrixes between overall race time and split times. Running showed a positive and slightly higher correlation with overall race time (0.89 for women and 0.9 for men, moderate to high) than cycling (0.87 for women and 0.86 for men). When univariate linear regression models are calculated, the coefficients of bike and run times are much larger than the coefficient of swim time (0.394 for swimming, 0.759 for cycling and 0.805 for running), indicating their higher influence on the overall finish time (Scatter plots for each split discipline can be seen in Fig. 6). Figure 6 presents the scatter charts of the "Finish Time" versus split times showing the linear relationship with each variable. Figure 7 summarizes the correlation coefficients (r-values) for women and men by age group. The swim correlation coefficients remain almost constant for females up to age group 40-44, before starting to decrease. For males. This decrease starts from age group 30-34. For the running and cycling disciplines, the correlation coefficients exhibit a smaller variation for both males and females.

Discussion
This study investigated the best predictive split discipline in female and male Ironman ® 70.3 age group triathletes, hypothesizing that running would be the best predictor of the overall finish time. We found, however, no differences in the regression analysis between the contributions of the swimming, cycling, and running splits. The correlation analysis showed stronger associations of cycling and running with overall race time than swimming and a more negligible difference in swimming performance between males and females. The data also showed a progressive reduction of the performance gap between men and women in age groups 50 years and older.
The influence of split disciplines has mainly been investigated in the full-distance Ironman ® triathlon. The full-distance Ironman ® triathlon has a long tradition since 1978 with the first edition in Oahu, Hawaii 5 [https:// www. ironm an. com/ histo ry]. Trends across years for the top three women and men competing in 'Ironman ® Hawaii' from 1983 to 2018 showed a significant decrease in each discipline for women and men, with cycling being the discipline with the most significant reduction. In addition, cycling was the discipline with the most significant influence on overall race time for both women and men 14 . For the full-distance Ironman ® triathlon, cycling was the most important split discipline 13,14 . When 51 male Ironman ® triathletes whom all finished a fulldistance Ironman ® triathlon below 8 h were considered, cycling was the split discipline that predicted the best overall race time, followed by running and swimming. Furthermore, cycling was the discipline with the highest performance improvement over the years 13 .
For the top three women and men competing in the Ironman World Championship (Ironman ® Hawaii) from 1983 to 2018, cycling was the discipline with the strongest influence on overall race time for both sexes 14 . For the Olympic distance triathlon, however, the running split seemed to be the most influential in overall race time A potential explanation for the importance of the cycling split in the full-distance Ironman ® triathlon might be the age-related performance decline for the three triathlon split disciplines. The age of peak performance in the Ironman ® 70.3 was higher than in the Olympic distance triathlon and lower than in the full-distance Ironman ® triathlon 20 . A previous analysis of Ironman ® 70.3 performance showed that the decline of performance with age www.nature.com/scientificreports/ differed by the split, starting the earliest in swimming and later in running and cycling 21 . However, another analysis found an earlier performance decline in running than in swimming 22 . It has been shown that the age-related decline in triathlon performance is specific to both the discipline and the distance, with cycling showing fewer declines in performance with age than swimming and running for both the Olympic distance triathlon, the full-distance Ironman ® triathlon 17 and in ultra-distance triathlons 23 . For the Ironman ® 70.3 triathlon, the age-related decline started earliest in swimming (from the first age group on) for both men and women. For running, the performance decline began at 26 years in men and 28 years in women. The latest age-related decline started in cycling at 34 years for men and at 35 years for women 16 . Regarding the three split disciplines, swimming was generally less influenced for longer triathlon distances, such as an ultra-triathlon 6 .
We also need to consider the influence of the cycling split on the running split 24 . Under laboratory conditions, highly variable power distribution in cycling can impair the 10-km triathlon run performance 24 . Furthermore, differences have been reported regarding the cycling intensity of younger and older triathletes. It has been shown that elite master triathletes achieved a higher peak power output in cycling than elite junior triathletes 25 . Under field conditions, the different disciplines seemed to influence overall performance differently. A study investigating sprint distance triathlon, Olympic distance triathlon, Ironman 70.3 ® (half-distance), and Ironman ® 140.6 (full-distance) showed that each split discipline had different importance in predicting overall race times regarding the length of a triathlon race. Swimming was the most predictive in sprint and Olympic distance triathlon, cycling in half-distance and running in full-distance Ironman 15 .
With regards to the variation of the relationship between overall and split times by age, we found a considerable decrease in the r-value for male swim times. This indicates to senior male competitors in Ironman 70.3 that, www.nature.com/scientificreports/ if they are aiming to improve the overall time, they need to train for a good bike and run times rather than for a fast swim split. The effect is not so noticeable in women, who managed to keep the swim split more relevant until the age group 40-44 years. In all cases, males and females, the run split show the highest value and flattest curve, so it is the unquestionable predicting factor from a Pearson correlation perspective. Compared to the other locomotion modes (running and cycling), swimming induces an increased workload on the upper limbs 26 ; thus, it would be reasonable to observe a weaker relationship between swimming and overall triathlon performance, which relies on the lower limbs. The factors related to the decline in sports performance include both   27 . Moreover, despite a general trend of a progressive age-related decline in world class performances in freestyle swimming, marathon running and triathlon 28 , it has been well known that muscle strength and power performances would exhibit a larger decline with aging than aerobic performances 29 . This observation was in agreement with an analysis of Olympic triathlon age group athletes at the World Championships, where a decrease of close to 5% per year after 45 years of age was observed in swimming, but a decline in performance was less pronounced in cycling until the age of 60 years 30 .
A limitation of the present study was the specificity of the triathlon format examined (Ironman ® 70.3) in terms of physiological and psychological demands 15,31,32 . Thus, caution would be needed to generalize our findings to other triathlon formats. Moreover, the number of finishers in the older age groups was smaller than that of younger age groups, which might explain some inconsistent results concerning differences between the oldest age groups (e.g. Figure 7). On the other hand, the strength of the study was that it analyzed a large database of Ironman ® 70.3 spanning 17 calendar years and including ~ 800 races and ~ 800,000 triathletes. These findings provide practical information for practitioners working with Ironman ® 70.3 athletes regarding setting optimal training goals based on sex, age and discipline. It should be noticed that the data analyzed concerned different www.nature.com/scientificreports/ races, which differed for slope, terrain, air and water temperature, wind and humidity. Thus, future studies should consider the role of these circuit characteristics and environmental conditions on the relationship of overall with split performances.

Conclusions
Age group triathletes competing in Ironman ® 70.3 races have no differences in the regression analysis regarding the contributions of the swimming, cycling and running splits to overall performance. However, the correlation analysis showed stronger associations between cycling and running than swimming with overall race times. The present findings have practical applications for coaches working with triathletes preparing for Ironman ® 70. 3. The knowledge that all disciplines were predictors of race performance would suggest that the training for this race should be balanced without neglecting any discipline. It was acknowledged that the cycling split was an excellent opportunity for fueling, and an optimal performance in this split would enable triathletes to cycle more comfortably at a higher exercise intensity (i.e., better exercise economy) and save energy for the running split.

Data availability
The athlete data was downloaded from the official Ironman website (https:// ironm an. com).